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1. Introduction 

Given n observations, each falling in one of m bins, wc would like to test if these 
observations are consistent with having arisen as independent and identically 
distributed (i.i.d.) draws from a specified probability distribution po over the to 
bins (po is known as the "model" ) . A natural measure of the deviation between 
Po and the observations is the square x a of the Euclidean distance between the 
actually observed distribution of the draws and the expected distribution pa, 
that is, 

tn 

Xa = ^2((ya)k - (Po)kf, (1) 
fc=l 

where (y a )i, (2/0)2, ■ • • , {Ua)m are the proportions of the n observations falling 
in bins 1, 2, . . . , m, respectively. 

The "P-value" is then defined to be the probability that Xq > x a , where Xq 
would be the same as x a , but constructed from n draws that definitely are taken 
i.i.d. frompo; that is, 

m 

X o = X](( y o)fc - (Po)k) 2 , (2) 
fc=i 

where (Yb)i, (^0)2, (Yo)m are the proportions of n i.i.d. draws from p 
falling in bins 1, 2, . . . , to, respectively. When calculating the P-value — the 
probability that Xq > x a — we view Xq as a random variable while viewing x a 
as a fixed number. If the P-value is small, then we can be confident that the 
observed draws were not taken i.i.d. from the model pq. 

To characterize the statistical power of the P-value based on the Euclidean 
distance, we consider n i.i.d. draws from the alternative distribution 



Pa =Pa+ a/Vn, 



(3) 
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where a is a vector whose m entries satisfy X^feLi a k = 0- We thus need to 
calculate the distribution of the square X a of the Euclidean distance, 

n l 

X a = ]T((Fa) fe - ( P0 ) fc ) 2 , (4) 
k=l 

where (Y a )i, (V a )2, (Y a ) m are the proportions of n i.i.d. draws from p a 
falling in bins 1, 2, . . . , m, respectively. Section 4 below provides an efficient 
method for calculating the cumulative distribution function (cdf) of n ■ X a 
in the limit that the number n of draws is large. Section 5 below then de- 
scribes how to use such a method to plot the cdf of the P-values; this cdf is the 
same as the statistical power function of the hypothesis test based on the Eu- 
clidean distance (as a function of the significance level). Presenting this method 
is the principal purpose of the present paper, complementing the earlier dis- 
cussions of Perkins, Tygert, and Ward (2011b) and Perkins, Tygcrt, and Ward 
(2011a), which compare the Euclidean distance with classical statistics such as 
X 2 , the log-likelihood-ratio G 2 , and other members of the Cressie-Read power- 
divergence family; Perkins, Tygert, and Ward (2011b) and Perkins, Tygert, and Ward 
(2011a) review the classical statistics and provide detailed comparisons. 

As reviewed, for example, by Kendall et al. (2009) and Rao (2002), m-n- X a 
defined in (4) converges in distribution to a noncentral \ 2 in the limit that 
the number n of draws is large, when the model po is a uniform distribution. 
When po is nonuniform, m-n- X a converges in distribution to the sum of the 
squares of independent Gaussian random variables in the limit that the number 
n of draws is large, as shown by Moore and Spruill (1975) and reviewed in 
Section 2 below. Section 3 provides integral representations for the cdf of the sum 
of the squares of independent Gaussian random variables and applies suitable 
quadratures for their numerical evaluation. Section 4 summarizes the numerical 
method obtained by combining Sections 2 and 3. Section 5 summarizes a scheme 
for plotting the asymptotic power (as a function of the significance level) using 
the method of Section 4. Section 6 illustrates the methods via several numerical 
examples. 

The extension to models with nuisance parameters is straightforward, follow- 
ing Perkins, Tygcrt, and Ward (2011c); the present paper focuses on the simpler 
case in which the model po is a single, fully specified probability distribution. 

2. Preliminaries 

This section states Theorem 2.1, which is a special case of Theorem 4.2 of Moore and Spruill 
(1975). Before stating the theorem, we need to set up some notation. The set-up 
amounts to an algorithm for computing the real numbers <j\, 02, . . . , <J m -i and 
Ci) C2i ■ • • j Cm-i used in Theorem 2.1, where m is an integer greater than 1. 

First, we aim to define the positive real numbers a±, 02, . . . , <J m -i, given any 
to x 1 vector p whose entries are all positive. We define D to be the diagonal 
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m x to matrix 

D jt k = < <»>)*> ■>-- 

for j, k = 1, 2, . . . , to. We define H to be the m x m matrix 

Hi,k = \ 3= , k ( 6 ) 



for j, k — 1, 2, to. Note that is an orthogonal projector. We define 
B = HDH, so that B is the self-adjoint to x to matrix 

1 y-\m 1 ■ 7 



A- 



for j, k = 1, 2, . . . , m. As a self-adjoint matrix whose rank is m — 1 (after all, 
£> = HDH , H is an orthogonal projector whose rank is m — 1, and D is a 
full-rank diagonal matrix), B given in (7) has an eigendecomposition 

B = QAQ T , (8) 

where Q is a real unitary m x m matrix and A is a diagonal to x to matrix such 
that A„ ijTO = 0. Finally, we define the positive real numbers oi, <T2, . . . , <t to _i 
via the formula 

°l = 1/A fclfc (9) 

for fc = 1, 2, . . . , to — 1, where Ai.i, A2.2, ■ • ■ , A mim are the diagonal entries of 
A from the eigendecomposition (8). 

Next, we define the real numbers £1, C2, ■ • ■ , Cm— 1, given both po and an 
to x 1 vector a such that J^feLi a k = 0. We define the (to — 1) x 1 vector 

r? = Q T a, (10) 

where Q is the leftmost to x (to — 1) block of Q from the eigendecomposition (8), 
that is, Q is the same as Q after deleting the last column of Q. We can then 
define the real numbers £i, (2, ■ • ■ , Cm— 1 via the formula 

(k = Vk/&k (11) 

for k = 1, 2, . . . , m — 1, where 77 is defined in (10) and er is defined in (9). 

With this notation, we can state the following special case of Theorem 4.2 
of Moore and Spruill (1975). 

Theorem 2.1. Suppose that to is an integer greater than one, po is a probability 
distribution over to bins (that is, po is an m x 1 vector whose entries are all 
positive and $3&=i.(po)k = o- ^ an m x I vector such that 5Zfe=i a k = 0, and 
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(Y n )i, (Y n )2, ■ ■ ■ , (Y n ) m are the proportions of draws falling in bins 1, 2, . . . , 
m, respectively, out of a total of n i.i.d. draws from the probability distribution 

Pa=Po+ a/y/fi. (12) 

Suppose further that X n is the random variable 

m 

Xn = nY,((Yn)k~(po)k) 2 - (13) 

fe=l 

Then, X n converges in distribution to the random variable 

m— 1 

x x = J2 ol (z k . + a) 2 (w) 
fe=i 

as n becomes large, where Z\, Z2, Z m -\ are i.i.d. Gaussian random vari- 
ables of zero mean and unit variance, o\, 02, Cm-1 are the positive real 
numbers defined in (9), and £1, £2, Cm-1 are the real numbers defined 

in (11). The values of o~\, 02, o~ m -\ do not depend on the vector a; the 
values of C,\, C2, ■ ■ ■ , (m-l do depend on a. 

Remark 2.2. The mxm matrix B defined in (7) is the sum of a diagonal matrix 
and a low-rank matrix. The methods of Gu and Eisenstat (1994, 1995) for com- 
puting the eigenvalues of such a matrix B and computing the result of applying 
Q T from (8) to an arbitrary vector require only either 0{m 2 ) or 0(mlog(m)) 
floating-point operations. The 0{m 2 ) methods of Gu and Eisenstat (1994, 1995) 
are usually more efficient than the C(mlog(m)) method of Gu and Eisenstat 
(1995), unless m is unpractically large. 

3. Integral representations 

This section describes efficient algorithms for evaluating the cdf of the sum (14) 
of the squares of independent Gaussian random variables. The bibliography 
of Duchesne and de Micheaux (2010) gives references to possible alternatives to 
the methods of the present section. Our principal tool is the following theorem, 
representing the cdf as an integral suitable for evaluation via quadratures (see, 
for example, Remark 3.2 below); the theorem expresses formula 7 of Rice (1980) 
in the same form as formula 8 of Perkins, Tygert, and Ward (2011b). 

Theorem 3.1. Suppose that £ is a positive integer, Z\, Z2, Ze are i.i.d. 
Gaussian random variables of zero mean and unit variance, and o~\, 02, ■ ■ ■ , o~i 
and £1, ^2; • • • ; Ci are rea l numbers. Suppose in addition that X is the random 
variable 

X = J2crl(Z k + ( k )\ (15) 

k=l 
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Then, the cdf F of X is 



F(x) = 



Im 



k=l 



/or any positive real number x, where 



Wfc(y) = 1 - 2(y - l)o%/x + 2iyo%y/i/x, 



dy (16) 



(17) 



and F[x) = 0/or any nonpositive real number x. The square roots in (16) denote 
the principal branch, and Im takes the imaginary part. 

Remark 3.2. An efficient means of evaluating (16) numerically is to employ 
adaptive Gaussian quadratures; see, for example, Section 4.7 of Press et al. 
(2007). Good choices for the lowest orders of the quadratures used in the adap- 
tive Gaussian quadratures are 10 and 21, for double-precision accuracy. 

The remainder of the present section (particularly Remark 3.5) discusses 
the numerical stability of the method of Remark 3.2 and recalls an alternative 
integral representation suitable for use when the method of Remark 3.2 is not 
guaranteed to be numerically stable. The following lemma, proven in Remark 3.2 
of Perkins, Tygert, and Ward (2011b), ensures that the denominator in (16) is 
not too small. 

Lemma 3.3. Suppose that £ is a positive integer, and r\, r%, . . . , rg and y are 
positive real numbers. Suppose further that (in parallel with formula (17) above) 



Wk = l- r k (y - 1) + r k iyVI 



fork=\, 2, 
Then, 



n 

fc=i 



> e 



-1/4 



(18) 



(19) 



The following lemma ensures that the numerator in (16) is not too large, 
provided that e^ fc / 2 is not large. 

Lemma 3.4. Suppose that r, y, and £ are positive real numbers and (in parallel 
with formulae (17) and (18) above) 



1 — r(y — 1) + riyVl. 



Then, 



Proof. Defining 



w 



< 



1 

z = — 
V 



(20) 



(21) 



(22) 
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and 



wc obtain that 



It follows from (24) that 



r 

1 — w 1 — z — i\fl 



1 — w 



1 — CZ — iyfl 

(l-z) 2 +£ 



(1 - cz) 2 + £' 

It follows from (22) that z > and from (23) that c > 1, and hence 

cz — 1 > z — 1. 

If z > 1, then (26) yields that 

(cz-l) 2 >(z-l) 2 , 

which in turn yields that 



(l-z) 2 +^ < (l-z) 2 + 



1. 



{l-cz) 2 +£ ~ (l-z) 2 + £ 
If z < 1. then (recalling that z > 0, too) 

(1 - z) 2 +£ < (1 - z) 2 + £ l+£ 



{l-cz) 2 +£ 
We see from (28) and (29) that, in all cases, 

(i-zf + e l 

(l-cz) 2 +^- £' 



(23) 
(24) 

(25) 

(26) 
(27) 

(28) 

(29) 

(30) 
□ 



Combining (25) and (30) yields (21). 

Remark 3.5. The bound (19) shows that the integrand in (16) is not too large 
for any nonnegative y, provided that the numerator of (16) is not too large. An 
upper bound on the numerator follows immediately from (21): 



II' 

k=l 



,C k Ci--w k {y))/{2w k {y)) 



< 



n 

fc=i 



tly/T+TJl/2 



(31) 



For any particular application, wc can check that the right-hand side of (31) is 
not too many orders of magnitude in size, guaranteeing that applying quadra- 
tures to the integral in (16) cannot lead to catastrophic cancellation in floating- 
point arithmetic. Naturally, it is also possible to check on the magnitude of the 
integrand in (16) during its numerical evaluation, indicating even better nu- 
merical stability than guaranteed by our a priori estimates. See Theorem 3.7 
and Remark 3.8 below for an alternative integral representation suitable for use 
when the right-hand side of (31) is large. 
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Remark 3.6. The bound in (31) is quite pessimistic. In fact, the real part of 
(1 — Wk{y)) I (2Wk(y)) is often nonpositive, so that 



,<k(l-™k(y))/(2w k (y)) 



< 1. 



(32) 



If the right-hand side of (31) is large, then we can use the method of Imhof 
(1961), Davies (1980), and others, applying numerical quadratures to the inte- 
gral in the following theorem. Please note that the integrand in the following 
theorem decays reasonably fast when the right-hand side of (31) is large. 

Theorem 3.7. Suppose that £ is a positive integer, Z\, Z2, ■ Z% are i.i.d. 

Gaussian random variables of zero mean and unit variance, and o~\, oi 7 ■ ■ ■ , &i 
and £1, (2, • • • ; Ci are rea l numbers. Suppose in addition that X is the random 
variable 



Then, the cdf F of X is 



k=l 



-ill 



Im 



n 



k=l 



Cl(l-v k (y))/(2v k (y)) 



*y IL=i VMv) 



dy 



for any positive real number x, where 

v k(y) = 1 - 2iyo%/x, 



(33) 



(34) 



(35) 



and F(x) = for any nonpositive real number x. The square roots in (34) denote 
the principal branch, and Im takes the imaginary part. 

Remark 3.8. The integrand in (34) is not too large (except for values of y 
that are closer to than are typical quadrature nodes), since the real part of 
(1 — Vk(y)) / (2vk(y)) is always nonpositive, so that 



,C*(l-w*(l/))/(2f*(j/)) 



< 1. 



(36) 



Moreover, the numerator in (34) decays reasonably fast (it is sub-Gaussian) 
when the right-hand side of (31) is large. 



4. Numerical method 

Combining Sections 2 and 3 yields an efficient method for calculating the cdf F 
of n times the square of the Euclidean distance between the model and empirical 
distributions, in the limit that n is large, when the n observed draws are taken 
i.i.d. from an alternative distribution p a = po + a/y/n (as always, po is the model 
— a probability distribution over m bins — and a is a vector whose m entries 
satisfy YlT=i ak ~ 0)- Indeed, Theorem 2.1 shows that the desired F is the same 
as that in (16) and (34), with the real numbers a\, 02, . . . , an and £1, (2, • • • , Ct 



Computing the asymptotic power of a Euclidean- distance test for goodness-of-fit 



9 



calculated as detailed in Section 2 (identifying I = m — 1). Remark 3.2 describes 
an efficient means of evaluating F(x) in (16) that is numerically stable when 
the right-hand side of (31) is not too many orders of magnitude in size. When 
the right-hand side of (31) is many orders of magnitude in size, we can apply 
quadratures to the representation of F(x) in (34) instead (see Remark 3.8). 

5. Plotting the asymptotic statistical power 

Let us denote by tt the cdf of the P-values for the Euclidean distance (or, equiv- 
alcntly, for any positive multiple of the square of the Euclidean distance); 7r is 
also the statistical power function of the hypothesis test based on the Euclidean 
distance (as a function of the significance level). The method of Section 4 is 
sufficient for plotting tt in the limit that the number of draws is large. Indeed, 
suppose that X denotes n times the square of the Euclidean distance between 
the model and empirical distributions, Fq denotes the cdf for X when taking 
n draws i.i.d. from the model probability distribution p$, and F a denotes the 
cdf for X when taking n draws i.i.d. from p a = Po + a/y/n, where a is a vector 
whose m entries satisfy JZfeLi a k = 0. The P-value P equals 1 — F n (X), in the 
limit that n is large, and then the cdf it of the P-values for draws from p a is 



tt(1 - F (x)) = Prob{P < 1 - Fo(x)} = Prob{l - F {X) < 1 - F (x)} 



for any nonnegative real number x] thus, the graph of all points (a, 7r(a)) with 
a ranging from to 1 is the same as the graph of all points (1 — Fq(x), 1 — F a (x)) 
with x ranging from to oo, in the limit that n is large. Section 4 describes how 
to evaluate Fq(x) and F a (x) for any real number x, in the limit that the number 
n of draws is large; note that Fq{x) = F a (x) when the entries of a are all zeros, 
so the procedure of Section 4 can evaluate Fq(x) as well as F a {x). When the 
entries of a are all zeros, £i = = ■ ■ ■ = Ce = in the method of Section 4, and 
then the right-hand side of (31) is exactly 1. 

6. Numerical examples 

This section illustrates the algorithms of the present paper via several numerical 
examples. As detailed in the subsections below, we consider three examples for 
the model po (as always, po is a probability distribution over m bins, that is, a 
vector whose entries are all positive and satisfy X)fcLi(-Po)fe = 1), taking n i.i.d. 
draws from the alternative probability distribution 



where a is a vector whose m entries satisfy J2T=i a k = (the subsections below 
detail several examples for a). Figure 1 plots the cdf tt of the P-values for n i.i.d. 
draws taken from the alternative distribution p a , when n is large; 7r is also the 



Prob{X > x} = 1 - F a (x) (37) 




(38) 
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statistical power function of the hypothesis test based on the Euclidean distance 
(as a function of the significance level). For each of the examples, Figure 1 plots 
the cdf 7r both for n = 1,000,000 draws (computed via Monte-Carlo simulations) 
and in the limit that n is large (computed via the algorithms of the present 
paper); not surprisingly, there is little difference between the plots for n = 

1.000. 000 and for the limit that n is large. The lines in Figure 1 corresponding 
to n = 1,000,000 draws are colored green; the lines corresponding to the limit 
of large n are black. 

Remark 6.1. For each example, we computed the cdf it for n = 1,000,000 
draws via 40,000 Monte-Carlo simulations. A straightforward argument based on 
the binomial distribution, detailed in Remark 3.4 of Perkins, Tygert, and Ward 
(2011a), shows that the standard errors of the resulting estimates of the P-values 
P are equal to y/P(l — P)/40000 < 0.0025, ensuring that the standard errors 
of the plotted abscissae a for the green points in Figure 1 are approximately 
y/a(l -a)/40000 < 0.0025 (roughly the size of the radii of the plotted points). 

Remark 6.2. For each example, we plotted the cdf 7r in the limit of a large 
number n of draws via the scheme of Section 5. Figure 1 displays the points 
{a, 7r(a)) = (1 - F Q {x), 1 - F a (x)) for the 10000 values x = 1/2000, 2/2000, 
10000/2000, in the limit that the number n of draws is large, where Fq(x) and 
F a (x) are defined in Section 5 and computed to at least 6-digit accuracy via the 
method of Section 4. 

Table 1 summarizes computational costs of the procedure described in Sec- 
tion 4. The headings of Table 1 have the following meanings: 

• m is the number of bins in the probability distributions po and p a . 

• qo is the maximum number of quadrature nodes required in any of the 
10000 evaluations of Fq plotted in Figure 1 (Section 5 defines Fq), using 
adaptive Gaussian quadratures as described in Remark 3.2. 

• q a is the maximum number of quadrature nodes required in any of the 
10000 evaluations of F a plotted in Figure 1 (Section 5 defines F a ), using 
adaptive Gaussian quadratures as described in Remark 3.2. 

• t is the time in seconds required to perform the quadratures for both 
Fq(x) and F a (x) at a single value of x, amortized over the 10000 pairs 
(1 — Fq(x), 1 — F a (x)) plotted in Figure 1 (Section 5 defines F a and F a ). 

6.1. Uniform model 

For our first example, we take 

(po)fc = 1/10 (39) 

for k = 1, 2, . . . , 10, and take 

a k = (-l) fc /5 (40) 

for k = 1, 2, . . . , 10. The Euclidean distance is equivalent to the canonical \ 2 
statistic for this example, since po is a uniform distribution. 
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6.2. Nonuniform model 

For our second example, we take 



1/198, fc = 2,3,...,100 (41) 



for k = 1, 2, . . . , 100, and take 



- / 2 / 3 ' k = 1 (AO\ 

ak ~ \ -2/297, fc = 2,3,...,100 1 > 

for k = 1, 2, 100. 

6.3. Poisson model 

For our third example, we take 

(p ) fc -e- 3 3 fe - 1 /(fc-l)! (43) 
for fc = 1, 2, 3, . . . , and take 

f (-l) fc /4, fc = 1,2,3,4 
a fc = < (-l) fe /2, ^ = 5,6 (44) 
[ 0, k = 7,8,9,... 

for k = 1, 2, 3, . . . . For all numerical computations associated with this example, 
we can truncate to the first 20 bins, since Ylk > =2i(Po)k < 10~ 10 . 

6.4- Poisson model with a different alternative 

For our fourth example, we again take 

(po)fc = e- 3 3 fe - 1 /(fc-l)! (45) 
for k = 1, 2, 3, . . . , but now take 

r i, k = i 

a k =\ -1/11, fc = 2,3,...,12 (46) 
[ 0, k = 13,14,15,... 

for k = 1, 2, 3, For all numerical computations associated with this example, 

we can truncate to the first 20 bins, since X]fe°=2i(Po)fc < 10~ 10 . 

Remark 6.3. The right-hand side of (31) is 8.233 for Subsection 6.1, 2.443 
for Subsection 6.2, and 24.05 for Subsection 6.3. As discussed in Remark 3.5, 
roundoff errors in the numerical evaluation of (16) are therefore guaranteed to 
be negligible for the standard floating-point arithmetic (the mantissa in the 
standard, double-precision arithmetic has a dynamic range of about 5 • 10 15 3> 
24.05). The right-hand side of (31) is 1.478 • 10 16 for Subsection 6.4, so we 
used (34) rather than (16) for the last example (Remark 3.8 explains why). 
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Table 1 
Computational costs 





m 


90 


Qa 


t 


example 


1 


10 


230 


230 


0.006 


example 


2 


100 


530 


550 


0.090 


example 


3 


20 


250 


330 


0.013 


example 


i 


20 


350 


350 


0.010 



We used Fortran 77 and ran all examples on one core of a 2.2 GHz Intel 
Core 2 Duo microprocessor with 2 MB of L2 cache. Our code is compliant with 
the IEEE double-precision standard (so that the mantissas of variables have 
approximately one bit of precision less than 16 digits, yielding a relative preci- 
sion of about 2 ■ 1CP 16 ). We diagonalized the matrix B defined in (7) using the 
Jacobi algorithm (see, for example, Chapter 8 of Golub and Van Loan (1996)), 
not taking advantage of Remark 2.2; explicitly forming the entries of the matrix 
B defined in (7) can incur a numerical error of at most the machine precision 
(about 2 • 10~ 16 ) times maxi<fc< TO (po)fc/ mini< fe < m (p )fc, yielding 6-digit accu- 
racy or better for all our examples. A future article will exploit the interlacing 
properties of eigenvalues, following Gu and Eisenstat (1994), to obtain higher 
precision. Of course, even 4-digit precision would suffice for most statistical ap- 
plications; however, modern computers can produce high accuracy very fast, as 
the examples in this section illustrate. 
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Fig. 1: Cumulative distribution functions of the P-values P for draws from the 
alternative distributions defined in Subsections 6.1-6.4 
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